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Preface 


This document is a final technical report of the results of research performed under Task 
5 (Methods Task M2), subtask 2 of NASA contract NASl-18444, Computational Struc- 
tural Mechanics (CSM) Research- This report summarizes an evaluation of matrix data 
structures for use with advanced CSM algorithms and applications. The material pre- 
sented herein was derived directly from studies and discussions among several Lockheed 
researchers with considerable expertise in the programming, use, and architecture of ma- 
trix methods for computational structural finite element analysis. Much knowledge and 
experience from the programming and use of the STAGS matrix methods were provided 
by Mr. Frank Brogan and Dr. Charles Rankin, who have kept STAGS at the forefront of 
advanced nonlinear analysis methods research for the past decade. Background theoret- 
ical and implementational details were provided by Dr. Bahram Nour-Omid, Ms. Mary 
Wright, and Dr. David Kang. Valuable technical background was also provided by Mr. 
Phil Underwood and Drs. Gary Stanley and Donald Flaggs. 
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1. Introduction 


Forming and manipulating large system matrices are key computational elements in 
the solution of large, complicated structural mechanics problems. In the typical case, these 
matrices are symmetric and sparsely populated, but of very large order, where the number 
of degrees of freedom may range from 100 to 100,000. Much research has been devoted 
to the formulation of data storage schemes and computational algorithms that minimize 
the computational costs associated with critical matrix manipulations. An example is the 
development of equation reordering algorithms to minimize the data storage and number of 
arithmetic operations required for the triangular factoring of sparse matrices. The benefits 
of such developments have been widespread and have enabled the numerical analysis of 
very large and complicated structures to be conducted economically on present computing 
machines. 

Today, the advancement of Computational Structural Mechanics (CSM) as an effec- 
tive engineering tool is still focused on increasing economy, although it considers not only 
economy of computational resources but economy of personnel resources as well. Accord- 
ingly, new methods such as automatic model error estimation and nonlinear substructuring 
are being developed to ease the computational burden on both the analyst and the com- 
puter. The CSM Testbed software system (see ref. 1) is intended to aid the development 
and implementation of these new methods by providing a common environment for the 
development and dissemination of advanced CSM algorithms and procedures. As such, 
the Testbed must have features which make it amenable to the incorporation of new, per- 
haps unforeseen, numerical operations. The purposes of this document are to identify the 
computational matrix algebra capabilities required for the extension of the CSM Testbed’s 
algorithmic capabilities, and to evaluate the suitability of certain matrix data structures 
to accommodate these extensions. 

This document is divided into three sections. The first section describes data storage 
schemes presently used by the CSM Testbed sparse matrix facilities and similar, skyline 
(profile) matrix facilities. The second section contains a discussion of certain features 
required for the implementation of particular advanced CSM algorithms, and how these 
features might be incorporated into the data storage schemes described previously. The 
third section presents recommendations, based on the discussions of the prior sections, 
for directing future CSM Testbed development to provide necessary matrix facilities for 
advanced algorithm implementation and use. 

The discussion presented in the following pages is necessarily limited since to evaluate 
all promising matrix data structures and their software would require efforts far in excess 
of the scope of the present work. Instead, the discussion is concentrated on the details of 
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only the Testbed sparse matrix and generic skyline matrix data structures and software. 
This narrower focus provides for a deeper examination of these two computational matrix 
structures and their applicability to advanced CSM algorithms than would otherwise be 
possible. The objective of this document is to lend insight into the matrix structures 
discussed and to help explain the process of evaluating alternative matrix data structures 
and utilities for subsequent use in the CSM Testbed. 
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2. Matrix Data Structures: Sparse and Skyline Schemes 

This section describes the data storage structures of the Testbed sparse matrix and of a 
generic, skyline-stored, profiled, symmetric matrix. Throughout this section the example 
finite element model depicted in Figure 1 will be referenced. This example is a simple 
finite-element model comprising five beam elements, two triangular plate elements, and 
one quadrilateral plate element. For purposes of illustration, all six nodes are assumed to 
have six active degrees-of-freedom (d.o.f.), providing a total of 36 degrees of freedom in the 
entire model. The nodal degrees of freedom are numbered in the conventional sense; one 
through three being associated with translational motions in the x , y and z directions and 
four through six being associated with rotations about the x, y and z axes, respectively. 
Note that the degrees of freedom associated with all translations at node 1 and with y and 
z translations at node 4 are suppressed by support boundary conditions. 


Example Problem Model: 

6 nodes 
6 d.o.f./node 



Element 
# type 

1 beam 

2 beam 

3 beam 

4 beam 

5 beam 

6 plate 

7 plate 

8 plate 


Connected 

Nodes 

I# 2 

2, 3 

3, 4 

2, 5 

3, 6 

1/ 2, 5 

3,4,6 
2, 3, 6, 


5 


Figure 1. Example Finite Element Model. 

2.1 Testbed Sparse Matrix Structure 

The Testbed sparse matrix data structure is a nodal-block oriented scheme for storing 
the elements of the upper triangle of a sparse, symmetric system matrix (see refs. 2 and 
3). The Testbed sparse matrix is stored in one of two forms depending on whether the 
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matrix has been factored. The logical structure of the unfactored Testbed sparse matrix 
using the interrelationships of the nodal-block submatrices is shown in Figure 2 for the 
example problem. Note that each box like 


( 1 , 2 ) 

denotes a 6 by 6 nodal-block submatrix connected to the two nodes listed in parentheses 
inside the box. In the example above, the block indicated contains the coupling contribu- 
tions from nodes 1 and 2. In the example problem, elements 1 and 6 contribute to this 
nodal-block submatrix (1,2). In the example matrix of Figure 2, the only block whose 
terms are present in the factored matrix but absent in the unfactored matrix is marked 
with a large “x.” 


1 

(1,1) 

(1,2) 



(1,5) 



(2, 2) 

( 2 , 3 ) 





( 3 , 3 ) 

(3,4) 

(3, 5) 

(3,6) 



(4,4) 

x 

(4, 6) 

symmetric 


(5,5) 

(5, 6) 





(6, 6) 



Indicates 6 by 6 nodal-block submatrices. In 
the case at left, the submatrix due to element 
connectivity between nodes 1 and 5 is depicted. 

Indicates nodal-block submatrix that is not present 
in the model stiffness, but will fill in during 
factoring. 


Figure 2. Sparse Matrix Nodal-Block Structure. 
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Both factored and unfactored Testbed sparse matrices are stored in a blocked, parti- 
tioned record scheme. Individual records are of constant length and contain both indexing 
data and matrix values. The indexing data are useful only as integer type, but are stored 
physically in the unfactored matrix structure in the same datum precision as the terms of 
the matrix itself. In the factored matrix structure, however, the indexing data are stored as 
integer type regardless of the datum precision of the matrix values. The record partitions 
differ in detail between the factored and unfactored matrix structures, owing primarily to 
the incorporation of constraint (d.o.f. suppression) information into the factored matrix 
structure. 

The record partitioning scheme and record contents for the unfactored stiffness matrix 
of the example problem shown in Figure 1 are presented in Figures 3 and 4. To make the 
substance of Figure 4 more illustrative, a record length (LREC) of 384 words has been 
chosen. The fundamental unit of information in the record partitioning scheme is the 
nodal-block subrecord. The nodal-block subrecord comprises nodal index information and 
all nodal-block submatrices that contribute to the rows assigned to the diagonal-block 
node in the upper triangle of the system matrix. The first node listed in the nodal index 
is referred to as the diagonal-block node since its nodal block appears on the diagonal 
of the system matrix. The nodal index information includes the number of nodal-block 
submatrices present in the subrecord (for the current diagonal-block node) and the node 
numbers associated with the columns of these nodal-block submatrices. The size of each 
nodal-block submatrix is the square of the maximum number of degrees of freedom at 
each node. This value is obtained from the START com m and of processor TAB in that 
nodal degrees of freedom constrained throughout the model are specified. For example, 
the command START 100 6, would constrain d.o.f. 6 (normal rotation) at 100 nodes, and 
the maximum number of degrees of freedom at each node is then 5. 

Note that the records are partitioned so that complete nodal subrecords are contained 
within one record, t.e., the matrix information associated with a nodal-block row of the 
matrix is not allowed to span record boundaries. Thus, the record size is used only as a 
data manager parameter, and transmits no specific information about the matrix itself, or 
how the record partitions are to be interpreted. All interpretive information is encountered 
sequentially as the record is processed from the first word through the LREC t/l word. 

The record partitioning scheme and record contents for the factored stiffness matrix 
of the Figure 1 example problem are presented in Figures 5 and 6. The first entry in 
the nodal index (see Figure 5) is the number of the node contributing to the diagonal 
submatrix block of this nodal-block row of the factored matrix. This node is referred to as 
the “diagonal-block” node. The second entry in the nodal index is the number of degrees- 
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of-freedom active for this diagonal-block node (in the range 1 through 6). Following this 
number are the local degree-of-freedom numbers associated with these active degrees-of- 
freedom. Each of these local degree-of-freedom numbers are unique and in the range 1 
through 6. Following the local degree-of-freedom numbers is the number of off-diagonal 
nodal submatrices appearing in this row of the factored matrix. The final entries in the 
nodal index are the numbers of the nodes contributing to the off-diagonal nodal-block 
submatrices. For purposes of illustration, a record length (LRA) of 384 words was chosen 
for the detail of the record contents in Figure 6, and only the first record is shown. The 
subscripts of the D * and L terms in Figure 6 refer to degree of freedom numbers, assigned 
sequentially in groups of six to each node. 

Data Records: n records of length LREC words. 



Subrecord 

header 


Key 


Contents 

number of nodal-block rows in the upper 
triangle of the system matrix contained in 
this record. 


nodal index |J|j number of nodes contributing to the 

nodal-block submatrices in this row and 
the numbers of these nodes. 


nodal-block 

submatrices 


6 by 6 submatrices of matrix coefficients 
in the rows of the upper triangle of the 
matrix connected to the nodes listed in 
the nodal index. 


Figure 3. Record Partitioning Scheme for Testbed Unfactored Sparse Matrix. 
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Record 1: 


0, 5, 1 1 3 1 


3 

BOB 

CM 

i — 1 

ESI 


DD 

^^^M8Eaaag3sggss5Bsssa 


h 1 B i H i 


188, 


337, 


(2, 3) I 4 3 4 5 6 (3,3) | (3,4) 


(3,5) 


(3, 6) 


3 8 4 1 Record contents: 

3 nodal-block rows 

9 nodal-block submatrices (36 words each) 
337 words used (47 words null fill) 


null fill 


Record 2: 


I 4 

1 7 «! 

1 

3 

2 4 6 

(4,4) (4,6) 

2 5 6 

(5,5) 

(5,6) 


\y//////////A^ 

J-Uj-Ul.Li.l l.iilliJLt 1 1 Hi'irbi 


m \ t i i i * ! 1 jj j 


189, 

1 6 j (6, 5) | null fill 


Record contents: 

3 nodal-block rows 

5 nodal-block submatrices (36 words each) 
189 words used (195 words null fill) 


Contents 

number of nodal-block rows in the upper 
triangle of the system mat rix contained in 
this record. 

number of nodes contributing to the 
nodal-block submatrices in this row and 
the numbers of these nodes. 

6 by 6 submatrices of matrix coefficients 
Figure 4. Unfactored Sparse Matrix Record Contents for Example Problem. 


Subrecord 

header 


Key 


nodal index 


nodal-block 

submatrices 




384, 


j 
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Data Records: n records of length LRA words. 



Subrecord 

Key 

Contents 

header 


Number of nodal-block subrecords in this 
record. 

pointers 

us 

Physical (word) pointers to start of each 
subrecord in this record. 

nodal index 

■ 

Number of active d.o.f. and d.o.f. indices for 
current node, number of nodal-block row 
submatrices to follow and the numbers of the 
nodes associated with these row submatrices. 

factored row 
nodal- block 
submatrices 

58 

Nodal-blocks of rows of terms in the 
upper triangle of the factored matrix 
for each active d.o.f. of the current node. 


Figure 5. Record Partitioning Scheme for Testbed Factored Matrix. 
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5 76 15811 3 4 5 6 3 2 5 |*** D ( j A) L (4J) L (4 , 6 


* * * * n 






I ■■■■■>»■■■■■■» |glgggggll»gg"' 




2612345623 \ D 


■■■■■*■*■* ■■** *■; 




| ■■■■■■■■■■■■■>■■■■ ■■■■■*■■****** 




■■■■■■■»■■■■■«■■■**■***»»***»*»****»*»***** 


(10,10)^ (10.11)' ' ■ ^ (10,18) * * * D(ll.l!)L(Jl t J2)' ' ' ^(11,18) (12,7) 


!■■!■« ■■■ , * 1 ”»***";;i!!!igS!SlgSSSSKSKiK'8H”i”S!8'SS5'SSSSi””S 


sf: 9 |c sfe $ $ D J 

* J (12.I2) L (I2J3) 


urn 


36123456445 


(13,13) ^( 13,14) 


(14,14) 14,15) 


■■■■«■■■«■■■■■■■ ■■•■■•■■■**■*■■■*■ ****j 


III!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 







0(15,15) L(i5,]6) ■ ■ 

■^(15.36) * * * D(it,j6) L( I6 j 7 ) . . 

■ L(i6,36) 


A/7. 771 L 


’(17.17) •-‘(17,18) ■ ■ ■ *-(17.36) 


L f 17.36) ***** D(18,18) L(18,J9) 


313 

• ^(18,36) 


■■■■■■■■■■■■*• •■•■■■*■*■■ ■■■■■! 


iiiifiiiiipiiSi****»*I***»****|g*|*|*I**??lif g*Ii*?*l 


* - Denotes an unused, subdiagonal entry. 

Figure 6. Record Contents for Example Problem’s Testbed Factored Matrix. 















As in the unfactored matrix structure, nodal subrecords in the factored matrix axe 
not allowed to span record boundaries. Unlike the unfactored matrix structure, constraint 
data associated with suppression of nodal degrees of freedom are included in the matrix 
data records. The factored matrix rows corresponding to suppressed degrees of freedom 
are not included in the data. A map is provided at the beginning of the nodal subrecord 
to indicate the active degrees of freedom, as an indexed subset (l,...,n) of the degrees 
of freedom not constrained on the START card in TAB, for the current diagonal node. 
An interesting observation is that the factored matrix data cannot be decoded completely 
without additional information about the number of degrees of freedom per node in the 
finite element model, and which nodal degrees of freedom are potentially active. In the 
Testbed, this information is obtained from a modeling summary dataset JDF1.BTAB.1.8 . 

As an aside, one should note that the rather elaborate record partitioning schemes 
used for the Testbed matrices are by-products of the architecture of the underlying data 
management system (DAL). Three DAL features in particular are responsible for the 
original Testbed design choices to place indexing and matrix value data side-by-side in the 
data records and to break the matrix storage into fixed-length segments (*.«., records). 
These features are: 

1) DAL is a singly indexed, hierarchical data manager, so to group data 
in logically related sets frequently requires the use of inhomogenous 
data records within a single dataset. 

2) DAL handles datasets containing fixed-length records only. Different 
records in the same dataset cannot have different lengths. 

3) DAL is sector (physical disk block) addressable at the finest granular- 
ity. Thus, it is required that integer numbers of disk blocks be read 
or written through DAL. For practical core memory limitations and 
the most efficient use of disk space, the large matrices are blocked 
into records that are sized to integer-multiples of the disk block size. 

The pertinent observation to be made at this point is that the structure of matrix data is 
influenced not only by the structure of the matrix itself (in terms of zero and nonzero coeffi- 
cients), but also by the operational characteristics of auxiliary data management software. 
Herein lies the most intimate connection between the algebraic and data descriptions of 
the system matrix. 
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2.2 Generic Skyline Matrix Structure 


The generic skyline matrix data structure, as employed by the SKYNOM software 
system ( e.g ., see ref. 4), is an equation-based, profiled matrix storage scheme that uses a 
positional index to provide access to the rows of the lower triangle of a sparse, symmetric 
system matrix. The profiled matrix form takes advantage of sparsity on an equation-by- 
equation basis, rather than on a nodal basis as in the Testbed sparse matrix structure, by 
including only those terms from the first nonzero entry to the diagonal in each row. An 
integer index vector, called the diagonal pointer vector, defines row boundaries within a 
single, contiguous matrix values record by pointing to each individual diagonal term. 


The profile structure of the system matrix for the example problem is shown in Figure 
7. The entire shaded area in this Figure consists of individual matrix terms, each of which 
occupies a word of storage in a single large matrix values record. The size of the matrix 
values record is equal to the value of the last entry in the diagonal pointer vector. Table 
1 lists the diagonal pointer values for all 36 degrees of freedom of the example problem. 
Note that negative diagonal pointers indicate suppressed degrees of freedom in the model. 
The length of any particular row is the difference between the absolute values of that row’s 
diagonal pointer and the previous row’s diagonal pointer (the zeroth row’s diagonal pointer 
is zero). 



Table 1. Example Skyline Matrix Diagonal Pointers. 
(Table is 36 x 2) 


d.o.f. 

Diagonal 

Pointer 

d.o.f. 

Diagonal 

Pointer 

d.o.f. 

Diagonal 

Pointer 

1 

-1 

13 

85 

25 

217 

2 

-3 

14 

93 

26 

243 

3 

-6 

15 

102 

’ll 

270 

4 

lo": 

16 

112 

28 

298 

5 

15 

17 

123 

29 

327 

6 

21 

18 

135 

30 

357 

7 

28 

19 

142 

31 

376 

8 

36 

20 

-150 

32 

396 

9 

45 

21 

-159 

33 

417 

10 

55 

22 

169 

34 

439 

11 

66 

23 

180 

35 

462 

12 

78 

24 

192 

36 

486 
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ORIGINAL PAGE IS 
OF POOR QUALITY 




Matrix terms associated with all 36 
d.o.f. (Lower triangular portion only) 



Terms equal to zero (identically) 
included in profile. 


Figure 7. Skyline Matrix Structure for Example Problem. 
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The skyline matrix is stored in two records: an integer record for the diagonal pointer 
vector and a floating-point record for the matrix values. In the example problem, the 
matrix values record is 48G words long. The diagonal pointers and matrix values records 
for the example problem matrix are depicted in Figure 8. 


The present implementation of this skyline matrix data structure uses a word- 
addressable data manager (GAL-DBM, see ref. 5) so explicit blocking and record par- 
titioning schemes are not necessary. In fact, with GAL-DBM’s capability to read and 
write partial records, dynamic blocking of the system matrix depending on operational 
requirements and available workspace was implemented in the SKYNOM package. Dy- 
namic blocking of the matrix ensures the most efficient utilization of storage for matrix 
manipulations, and provides direct benefits in terms of reduced CPU and I/O costs as 
available memory is increased. Furthermore, the analyst does not need to be concerned 
about the details of managing record length and available memory workspace. 



( 36 words) 


(486 words) 


Figure 8. Skyline Matrix Data Structure for Example Problem. 
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3. Algorithm Requirements for Matrix Data Structures 


This section explores application-oriented issues of matrix data structures. First, use 
of the Testbed sparse and skyline matrix data structures in basic algebraic operations 
required by any finite-element analysis is discussed, and the performance of existing soft- 
ware for these operations is presented. Second, the applicability of the Testbed sparse 
and skyline matrix structures to the execution of advanced algorithms and capabilities 
in the Testbed is discussed. This latter section focuses in particular on application of 
the matrix data structures to the use of multipoint constraints, substructuring, advanced 
nonlinear algorithms, and p-version finite elements. Most of the discussion in this section 
draws on examples from linear formulations of algorithms. Adaptation to nonlinear for- 
mulations generally requires only the usual extensions such as iteration procedures and 
re-formation of system matrices, and thus simply represents a multiple use of the partic- 
ular advanced capabilities. The exceptions to this are the algorithms discussed in Section 
3.2.5 for traversing limit and bifurcation points. 

3.1 Basic Operations 

Both of the basic system matrix organizations and data structures described in the 
previous section have proven suitable for use in conventional finite-element analysis. Con- 
ventional operations in which matrices of these types are applied include those listed in 
Table 2. In addition to these operations, the critical operations of creating both the 
structure (topology) and the actual data of the system matrices are not to be neglected, 
although the details of these processes are not presented here. 

Table 2. Basic Finite-Element System Matrix Operations 


Operation 

Algebraic Representation 

Combine: 

A = aK + /?M 

Multiply: 

x = Ay 

Factor: 

A = LDL r 

Solve: 

Ax — y 

Forward reduction: 

Lw = y 

Backward substitution: 

L t x = w 

Eigenvalues: 

(K - AM)^ = 0 
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3.1.1 Software Measured Performance 

A primary concern in the utility of any given matrix data structure is the efficiency 
with which it can be manipulated by computational routines. Any comparison of diverse 
matrix data structures should therefore contain some measure of computational efficiency. 
However, one must be certain to define carefully the parameters and limits of validity of 
such measures in order to be specific regarding the nature of the comparison. 

A simple set of comparisons was made between basic matrix operations in the Testbed 
sparse matrix environment and in the SKYNOM skyline matrix environment. The com- 
parisons were based on elapsed CPU time and direct I/O requests required for matrix 
factoring, matrix-vector multiplication, and matrix equation solution using a previously- 
factored matrix. The matrix data for the comparison study were created using the Testbed 
software and translated to a skyline format for use with SKYNOM. Three finite element 
models were used as a basis for comparison; a 1818 d.o.f. space mast (singly laced, 101- 
bay triangular truss), a coarsely discretized pinched quarter-cylinder model (square mesh) 
with 486 d.o.f., and a 1734 d.o.f. version of the pinched quarter-cylinder model. A variety 
of nodal resequencing strategies was employed for the cylinder cases to gain the best fac- 
toring performance for each matrix type. In all, five different matrix cases were used for 
comparison. 

Performance data (CPU time and direct I/O requests) were taken immediately before 
and after the issuing of a command to invoke the operation being measured. Since both 
the Testbed and SKYNOM are command-driven, some overhead is incurred in command 
parsing, and this overhead is included in the measurements. To keep the comparisons on 
a common footing, each software package was allowed to use up to 200,000 32-bit words as 
a workspace, although no attempt was made to optimize the Testbed sparse matrix record 
length with respect to workspace usage. In addition, one should note that SKYNOM s 
automatic allocation of workspace and dynamic matrix blocking requires measured CPU 
and I/O resources not required by the explicitly blocked sparse matrix. All calculations 
were made in double (64-bit) precision. All executions were made in a single job stream 
on a VAX-ll/785 computer system to eliminate as much machine-environment variability 
as possible. 
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The results of the performance comparison studies are presented in Tables 3 and 4 for 
the five cases investigated. Notable aspects of these results are as follows: 

o The nested dissection ordering works well for the sparse matrix struc- 
ture but not very well at all for the skyline matrix structure. The 
Gibbs, Poole, Stockmeyer (GPS) algorithm (see ref. 6) seems to work 
the best for the skyline matrix structure, but causes factoring times 
for the sparse matrix structure to rise significantly. 

o CPU and I/O demands for factoring operations using good equation 
orderings for each system are competitive. In all cases tested, the best 
skyline matrix factor times are slightly lower than the best sparse ma- 
trix factor times. In some cases, the I/O activity for skyline factoring 
is significantly greater than that for sparse factoring. 

The solve operation using the Testbed sparse matrix is invariably and 
significantly slower and more demanding of I/O resources than the 
solve operation using the skyline matrix structure. 

CPU demands for the matrix- vector multiply operation are compet- 
itive for each matrix structure with ideal equation orderings. I/O 
demands using the skyline matrix structure are generally lower than 
those using the sparse matrix structure. 

Not surprisingly, different equation reorderings have different effects on the opera- 
tional requirements of different matrix data structures. These effects are primarily noticed 
in factoring costs, since the factoring operation is quite sensitive to the amount of fill-in in 
the sparse matrix structure and to bandwidth in the skyline matrix. The nested dissection 
ordering minimizes fill-in and thus tends also to minimize the number of arithmetic oper- 
ations required in factoring. The performance of the Testbed sparse factoring operation is 
particularly troubling in view of this trend since more time is taken to execute presumably 
fewer arithmetic operations than the skyline factoring using GPS ordering. Another dis- 
turbing feature of the sparse matrix structure and software is the poor performance of the 
solve operation. This drawback can be a significant hindrance when executing nonlinear 
or iterative algorithms and eigenvalue extractions. 

The performance of the Testbed sparse matrix software is indicative of significant 
overhead in the software unrelated to the actual numerical operations. Areas from which 
this overhead may arise include an inefficient use of available memory workspace and 
excessive I/O to load and cycle through many blocks of the topological and matrix data. 
The dynamic memory workspace allocation capability of the skyline matrix software seems 
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to provide an advantage primarily in reduction of I/O requests for solve and multiply 
operations. In these cases, SKYNOM is able to fit more of the matrix into the workspace 
at once, causing fewer I/O requests but more words transferred per request. On machines 
where I/O requests dominate I/O costs, this savings can be significant. 

Making the Testbed software use available memory effectively is difficult since user 
intervention is required to modulate the dataset record sizes of KMAP, AMAP , K and INV 
datasets (see ref. 7). For fixed-memory applications, even optimal sizing of these datasets’ 
records would result in a loss of efficiency in the solve operation since the maximum size of 
the factored matrix records is determined in the factoring process and the solve operation 
is processed on a record-by-record basis. Thus, basic inefficiencies are incurred by the use 
of the fixed-length, explicitly blocked matrix data structure. 

3.1.2 Software Theoretical Performance 

The operation of schemes for minimizing the system matrix storage and fill-in during 
factorization is central to the effective use of sparse matrix manipulation software. Results 
presented in Tables 3 and 4 for the square-mesh, pinched cylinder case strongly indicate 
this dependence, showing a factor of 2.5 better performance for factoring a Testbed sparse 
matrix with a fill-minimizing ordering versus a profile-minimizing ordering. This section 
presents an attempt to rank the various matrix ordering schemes objectively, j'.e., without 
consideration of extraneous software overhead. The number of floating-point operations 
(flops) required to factor a matrix arising from a square-mesh discretization is used as the 
figure of merit. 

The statistics for factoring a matrix arising from a 17 x 17 node mesh and from a 
31 x 31 node mesh are presented in Table 5. All data presented in Table 5 are nodally 
based, t'.e., one equation per node. The flop-counts assume that no trivial arithmetic is 
done. All matrix reorderings were computed by the Testbed’s RSEQ processor (see ref. 
1). The row- by-row ordering referenced in Table 5 is obtained simply by numbering the 
nodes sequentially along each row of the mesh. 
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Table 3. Study Case Parameters for Matrix Operation Performance Comparison. 


Model 

Designation 

Number of 

d.o.f. 

Resequencing 

Strategy 

Matrix Size Parameters 

id* 

icS** 

bw^ 

l) 101-Bay Mast 

1818 

none 

4010 

1406 

1.4% 

2) Pinched Cylinder 

486 

Nested Dissection 

3833 

725 

22 % 

3) Pinched Cylinder 

486 

Gibbs, Poole, Stockmeyer 

6617 

949 

14% 

4) Pinched Cylinder 

1734 

Nested Dissection 

31829 

3865 

15% 

5) Pinched Cylinder 

1734 

Gibbs, Poole, Stockmeyer 

80385 

6317 

7.4% 


* Number of submatrix multiplications required to factor the system matrix [ref. 1, p. 0.3-3) 
Proportional to number of submatrix multiplications to solve system (ref. 1, p. 0.3-4) 
f Normalized Skyline Semi-Bandwidth (number of matrix terms / (number of d.o.f.)^ ) 


Table 4. Execution Statistics for Matrix Operation Performance Comparison. 


Model 

Performance 

Testbed Operation 

SKYNOM Operation 

Number of 

Measure 

Factor 

Solve 

Multiply 

Factor 

Solve 

Multiply 

1 

CPU sec. 

11.84 

5.47 

1.65 

9.05 

1.97 

1.70 


Dir. I/O Req. 

301 

256 

71 

523 

27 

24 

2 

CPU sec. 

9.19 

4.10 

1.22 

8.13 

0.95 

0.55 


Dir. I/O Req. 

155 

178 

58 

185 

27 

22 

3 

CPU sec. 


3.79 


mwm 

0.66 

0.46 


Dir. I/O Req. 

175 

171 

53 
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4 



9.83 

2.94 

183.24 

7.93 

4.32 


Dir. I/O Req. 

| 

436 

114 


141 

78 

5 

CPU sec. 


12.80 

3.08 

58.56 

3.91 

2.50 


Dir. I/O Req. 

| 

600 

115 



48 


Table 5. Theoretical Factoring Performance Comparison. 


Resequencing 

Strategy 

Mesh 

Size 

Non-zeroes t 1 ) 

flops 

Fill-In 

Average 
Loop Size^ 2 ) 

Nested Dissection 

17 x 17 

3964 

30400 

2619 

6 

Row-by-Row 

17 x 17 

5185 

45400 

3840 

9 

Gibbs, Poole, Stockmeyer 

17 x 17 

6185 

78000 

4840 

12 

Nested Dissection 

31 x 31 

18379 

228000 

13758 

9 

Row-by-Row 

31 x 31 

30721 

485000 

26100 

16 

Gibbs, Poole, Stockmeyer 

31 x 31 

38406 

849000 

33785 

22 


(1) - Nonzero elements in factored matrix 

(2) - average size of SAXPY loop 
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From the data presented in Table 5, one can readily see that the nested dissection or- 
dering theoretically requires many fewer arithmetic operations for factoring than either the 
natural, row-by-row ordering or the Gibbs, Poole, Stockmeyer ordering. Also, the ratio of 
flops between the nested dissection and Gibbs, Poole, Stockmeyer cases is reasonably close 
to the ratio of CPU times for these two cases in Table 4. One should note, however, that 
the flop-counts for the row-by-row or Gibbs, Poole, Stockmeyer orderings do not correlate 
with the CPU time statistics when comparing Testbed and SKYNOM performance. In 
particular, the Testbed factoring on the 17 x 17 node mesh with nested dissection ordering 
exhibits a computational average rate of approximately 376 nodal flops per CPU second 
whereas the SKYNOM factoring of the same matrix ordered according to the Gibbs, Poole, 
Stockmeyer algorithm exhibits a computational rate of approximately 775 nodal flops per 
second (discounting trivial arithmetic). 

3.2 Advanced Operations 

Virtually all advanced matrix operations can be described as a suitable combination 
of the more elementary operations listed in Table 2. Indeed, this is how the algebraic 
descriptions of advanced algorithms are evolved. Two factors impede the adoption of an 
analogous numerical approach, however. These factors are the consideration of efficiency 
in numerical operations and the restrictions arising from matrix data structures. 

Considerations of numerical efficiency often drive algorithms deep into code, since 
that is where computational efficiency is most obtainable. Thus, a question arising for 
advanced algorithm development is: 

o “How easy is the modification of the basic operations’ code to accom- 
modate necessary new operations?” 

As the operations’ code is constructed to operate using a particular structure of matrix 
data, this first question is directly linked to another: 

o “How amenable is the matrix data structure to the new operation’s 
requirements?” 

The answers to these related questions, with respect to the Testbed sparse and generic 
skyline matrix software, are the subject of the following sections. The following discussion 
explores these questions in light of the requirements of specific near-term Testbed advanced 
algorithms and capabilities including general constraints, substructuring, advanced Riks 
procedures, equivalence transformation procedures, and p-version finite elements. 
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3.2.1 Incorporation of General Constraints — Background 


The most general methods for incorporating constraint conditions into a linear system 
of equations are to use Lagrange multipliers or penalty factors. The Lagrange multiplier 
method has the advantage of satisfying the constraint exactly, but at the expense of added 
Lagrange multiplier degrees of freedom. The penalty method avoids the additional degrees 
of freedom, but satisfies the constraint only approximately. When penalty factors are not 
well scaled, they may reduce the precision of the overall solution or cause the system 
matrix to become ill-conditioned (ref. 8). A third method for imposing constraints is the 
actual elimination of dependent degrees of freedom according to the constraint relations. 
This method is more difficult to apply since system matrix coefficients must be augmented 
to include the effects of the eliminated, dependent degrees of freedom in the retained, 
independent equations. This method has the advantages that the c ons t ra int is satisfied 
exactly and that the problem size is reduced to the number of independent degrees of 
freedom only. The drawbacks of this method are that it requires dual sets of node-d.o.f. 
mappings, global modification of the system matrices, and special handling of solution 
vectors to expand them back to the full system. 

The mathematical descriptions of these three constraint-enforcement methods all re- 
quire that the constraint conditions be expressed as 

P r q c +R r q u = 0 ( 1 ) 

where q^. are the constrained (dependent) degrees of freedom and q u are the unconstrained 
(independent) degrees of freedom. The total system vector is then 


q = 


q u 

q c 


For the constraint set to be minimal and consistent, P must be a square, nonsingular 
matrix of dimension equal to the number of constrained degrees of freedom, n c . The matrix 
R is rectangular with the number of rows equal to the number of independent degrees of 
freedom, n u , and the number of columns equal to n c . Equation (l) can also be expressed 
in terms of the total system degrees of freedom, n as 


L r q = 0 


( 2 ) 


where n = n u + n c and L is an n x n c rectangular matrix 

R 

L = ' 
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The Lagrange multiplier method augments the original system potential energy 

7T = ^q r Kq - q T f 

with the constraints of Eq. (2) and appropriate Lagrange multipliers A; 

W = *q T Kq -f- q T LA - q r f. 

Setting the first variation with respect to the q , and A, to zero produces 

L r q = 0 
Kq + LA = f. 

Equations (3) can be combined into a single system 



which becomes the new model system used for subsequent, constrained solutions. 

In the penalty method, the constraints are enforced by applying large, internal stiff- 
nesses acting through L. The potential energy of the constrained system is written 

W = ^q r Kq + ^fcq r LL T q - q T f 

and setting the first variation to zero produces 

(K + /cLL r )q = f. 

The penalty parameter, k, is chosen to make the terms in LL r a few orders of mag- 
nitude larger than their counterparts in K so that the penalty stiffness dominates over the 
model stiffness and produces solutions which approximately satisfy the constraint. How- 
ever, the addition of several constraints with large penalty parameters can degrade the 
precision of the solution by causing some significant model stiffness terms to be treated as 
small with respect to the penalty terms. 
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The degree of freedom elimination method uses only the unconstrained degrees of 
freedom in the constrained system by eliminating the q c . The elimination is accomplished 
by writing the q c in terms of the q u from Eq. (1); 

q c = P r R T q u . 


Writing the total system vector in terms of the unconstrained degrees of freedom only gives 


q = 


M = 

I 

3 

►H 

l 

J 

_P“ r R r j 


{q u } = £ 


T 

q« 


( 4 ) 


where I u is the identity matrix of rank n u . Substituting the right-hand side of Eq. (4) into 
the expression for the system potential energy and setting the first variation with respect 
to the unconstrained degrees of freedom to zero gives 

LKL T q u = If (5a) 


The new, reduced system to be solved is 

Kq u = f (5b) 


where K = LKL and f = Lf and the total system degrees of freedom are computed 
from the solution q u Hby Eq. (4). One should note, however, that the magnitudes of the 
terms in P directly affect the conditioning of the constrained system. In particular, if P 
is poorly conditioned, K will also be poorly conditioned and the actual constraint relation 
may become numerically singular, i.c., P may not be numerically invertable. 

3.2.2 Incorporation of General Constraints — Implementation 

For any of the three constraint methods described above, it is necessary to define the 
constraints, t.e., to define L, L or R and P. The only aspects to be determined for a 
particular finite-element program system are (i) how to choose a constraint method, (ii) 
how to define the q c , q u in terms of data already accessible to the system, and (iii) how 
to define the constraint coefficients in L. 

In the current CSM Testbed, the user has control over the definition of nodes, elements, 
and table or system vector entries. Degrees of freedom are implicitly (never explicitly) 
assigned to nodes, and the coupling of the equilibrium equations is determined by nodal 
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connectivity rather than degree of freedom connectivity. With these operational features, 
the constraint data must be defined in nodal terms. 

The nodal-block organization of the Testbed sparse matrix requires that constraint 
data be associated with nodes. Accordingly, Lagrange multipliers must be associated with 
a “dummy” node, t.e., a node with no structural model degree of freedom. The Lagrange 
multiplier method, as implemented in SPAR Level 13A (ref. 9), required the use of the 
experimental element facility to define a stiffness matrix format containing the L matrix 
for a constraint element connected tom + 1 nodes, where m is the number of nodes having 
degrees of freedom corresponding to nonzero terms in L. The number of constraints that 
could be associated with a single dummy node would be up to the number of active degrees 
of freedom per node. Node-to-node constraints are the only kind of constraints that are 
implemented easily in this system. 

The penalty method could be readily applied to the Testbed sparse matrix in the 
same manner as the Lagrange multipliers were incorporated but without the necessity for 
dummy nodes. In this case, a penalty element matrix could be defined to be assembled 
directly with all other structural elements. 

The degree of freedom elimination scheme is difficult to apply to the Testbed sparse 
matrix structure. The implementation would most likely assemble the portions of the com- 
plete matrix associated with independent and dependent degrees of freedom in separate 
nodal-block matrices and combine them appropriately to produce the global, reduced sys- 
tem matrix K. The topology analysis must also account for the connectivity of the nodes 
of the dependent degrees of freedom in consideration of nodal connectivity of each con- 
stituent independent degree of freedom. Structures analogous to the JDF1.BTAB.1.8 and 
CON.. neon datasets (see ref. 7) are required for use with the reduced system since system 
vectors are shortened and constraint data are not as straightforward as simple degree of 
freedom suppression. Special consideration is needed for the flagging of constrained degrees 
of freedom that have been eliminated, since no constraint types in addition to “prescribed 
zero” and “prescribed nonzero” can be accommodated in the structure of CON.. neon and 
the constraint relations do not fit into the structure of an APPL.MOTI system vector. 

Incorporation of constraints into the skyline matrix structure would be more straight- 
forward than incorporation into the Testbed sparse matrix structure simply because the 
skyline structure is based on degrees of freedom rather than on nodes. Furthermore, 
since no integrated finite-element software analogous to the Testbed has been built up 
around the generic skyline matrix package, no potential conflicts exist between d.o.f.-by- 
d.o.f. augmentation or reduction of the equation system and auxiliary system data like 
the JDFl.BTAB dataset used by the Testbed software. Also, the pointer vector to matrix 
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diagonal terms (referred to as a diagonal-pointer vector) provides a single, minimally sized 
map of the contents of the single matrix values record, making access to any portion of 
the skyline-stored matrix straightforward. However, if bandwidth minimization schemes 
are employed, equivalence constraint information must be taken into account in the deter- 
mination of degree of freedom connectivity in order to preserve efficiency in the factoring 
of the constrained matrix. 

Lagrangian constraints would be implemented into the skyline matrix structure in a 
manner analogous to the Testbed sparse matrix implementation - as a special “constraint 
element.” The constraint element in this case would have a minimum of three degrees 
of freedom, rather than three associated nodes (18 degrees of freedom), and would be 
included directly in the topology analysis and matrix assembly processes. Incorporation of 
penalty constraints in the skyline matrix structure is also quite straightforward and would 
be accomplished through the device of a constraint element formed by a special-purpose 
processor. This approach would follow a new development implemented into the STAGS 
program (see refs. 10 and 11) which used an augmented Lagrangian approach to define 
multi-point constraints, eliminating past problems with equation ordering and otherwise 
unconnected structures. 

The degree of freedom elimination procedure for the skyline matrix data structure is 
almost as difficult as for the Testbed sparse matrix, although none of the incompatibilities 
with other model data records are present. The structure is simplified through the use 
of the diagonal pointer vector, which is defined by only the lowest-referenced degree of 
freedom in any equation, rather than by all connected nodes (or degrees of freedom). 
In the skyline format as in the Testbed sparse format, matrix coefficients would most 
probably best be calculated on a term-by-term basis and assembled into a reduced-order 
system matrix. 

3,2.3 Substructuring Operations - Background 


The use of substructures promises great payoff in reducing the size of very large 
analysis problems. In global-local analysis, substructuring is a crucial procedure to remove 
slowly varying^ global phenomena from a highly detailed, perhaps nonlinear, local model 
without ad hoc assumptions regarding local boundary conditions. 


The procedure used in substructuring is to express the degrees of freedom associated 
with some subregion of the domain, called the interior domain, in terms of the degrees of 
freedom in some other subregion of the domain, called the boundary domain. To accomplish 
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this, the following partition is used for the system equilibrium equation: 


K,, 

K«b " 


l 

K&6 

U 6 ; 



( 6 ) 


The interior degrees of freedom, q, , are eliminated to produce the equilibrium equation in 
terms of the boundary degrees of freedom: 

(K 66 - K^K^K.b) q 6 = f 6 - Kf b K^ l ti (7) 

with 

<J. = K-‘ (f. - K jt q t ) 


Note that when a diagonal-scale factoring of the interior matrix K,j is used (K,-,- = 

LDL r ); 

ib = w r D _I w 


where 

L r w = K,b 


The computation of w requires only a single-pass reduction and can save approximately 
half the cost of a full, forward-reduction and back-substitution solution procedure. 

For multiple substructures, each interior domain is expressed in terms of a set of 
retained boundary degrees of freedom, whose matrix coefficients are assembled into the 
final retained equation. In the nonlinear case, degrees of freedom associated with the 
nonlinear domain are also included in the retained system, which is then solved using 
nonlinear procedures. 

An alternative approach to substructuring, and that taken in the substructuring capa- 
bility presently implemented in the CSM Testbed, is to express the motion of the boundary 
as a linear combination of suitable boundary functions, by. This approach is from SPAR 
Level 13 A (ref. 9). These boundary functions are the total displacement pattern of the in- 
terior and boundary degrees of freedom under a unit-imposed displacement at exactly one 
boundary degree of freedom, with all others suppressed. Thus, the number of boundary 
functions is equal to the number of boundary nodes times the number of degrees of freedom 
per node. The by also contain contributions to the motions of the interior domain degrees 
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of freedom. The remainder of the interior motions are described through a user-defined 
set of interior displacement functions, qy. The displacement of the entire domain is 

q = ^2 h i x j + Y,VkVk (8) 

j - 1 fc=l 


where nj, are the boundary degrees of freedom, n, are the interior displacement functions, 
and the xy and y& are the amplitudes of the j th boundary function and k tft interior dis- 
placement function, respectively. Denoting the matrix whose columns are the by as B and 
the matrix whose columns are the q fc as Q, the equilibrium Eq. (6) can be rewritten as 


b t kb 

0 



( 9 ) 


The decoupled system of Eq. (9) can be solved for the amplitudes of the boundary functions 
and the interior displacements. For the static problem, an exact solution is obtained when 
the Q contains an interior displacement vector due to f, with f{> = 0. Otherwise, an 
approximate solution will result in the interior domain causing the global solution also to 
be only approximate. 

3.2.4 Substructuring Operations - Implementation 

The Testbed sparse matrix substructuring scheme requires the following calculations: 

1) Form the full system matrix K. 

2) Factor K with all boundary degrees of freedom prescribed and solve 
for the boundary functions by. 

3) Solve for the interior displacement functions q fc . 

^ ji ^ 

4) Calculate B T KB and Q KQ and store as full, lower-triangular 
matrices. 

5) Factor B r KB and Q KQ and solve Eq. (9). 

6) Expand the solution vectors x and y into full system solution vectors 
q = Bx + Qy. 

The steps indicated above are implemented in the CSM Testbed for dynamic analysis 
through several processors: K. INV. SSOL. AUS/SSPREP. AUS/SSM. AUS/SSK. SYN. 
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STRP, and SSBT. At present, the detailed structure of the matrices generated by SYN is 
not known. Neither is the compatibility of SYN matrices with INV and AUS/PROD known. 
One characteristic that is documented is that SYN must work with models using no less 
than 6 degrees of freedom per node. The only Testbed substructuring procedure presently 
fully documented for use is for substructured vibration eigenvalue analysis. 

Management of substructures in the Testbed is accomplished by segregating individual 
substructures into separate data libraries. Thus, confusion over domain-specific tabular 
data ( e.g ., JDFl, ALTR, JLOC, JREF and CON ) is avoided in operations using “full 
model” processors like INV and SSOL for individual substructure systems. Thus each 
substructure model can retain its own local system matrices. 

Aside from the definition of substructure interior and boundary domains, two unique 
capabilities are required to implement a partitioned substructuring scheme effectively. 
These are (a) a capability to extract or construct the boundary coupling matrix K, b and 
(b) the upper-triangular matrix solution procedure to determine w. 

The K n, matrix can be formed directly from the boundary functions matrix B by 
extracting only those terms corresponding to interior degrees of freedom, but computation 
of the B matrix requires n b complete solutions (forward and backward substitutions). 
Computation of the w matrix requires half the work, but assumes the coefficients of K l{) 
are available in a vector-block form. The complete I£, b is, of course, already available in 
the assembled system K, but extraction from that structure would be difficult, especially 
using the Testbed sparse matrix data structure. A more effective approach might be to 
use a substructure assembler processor that woul f d assemble the K„, Kw> and K, b directly 
from constituent element matrices. In this way, bandwidth minimization techniques could 
be applied independently to the diagonal submatrices K,-,- and K bb . 

A very general matrix assembly routine would also aid in forming the Schur comple- 
ment 

(K W - KSK^Ktt) = (K 6b -w r D- 1 w) 

since the matrix product will, in general, be full while the may be sparse. At present, 
no capability exists for combining Testbed sparse matrices with full, square matrices or 
matrices of differing topologies. 

The skyline matrix organization offers little formulative advantage over the Testbed 
sparse matrix system if an approach similar to that taken presently in the Testbed is 
followed, i.e., the boundary-function matrix reduction is used. If, on the other hand, 
a partitioning scheme were to be used, the picture changes slightly since significantly 
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more control over the submatrix assembly process could be designed into a needed skyline 
matrix assembler. Also, the triangular-matrix solution operation is presently available for 
the skyline matrix structure. 

3.2.5 Advanced Solution Algorithms - Requirements 

Advanced nonlinear solution algorithms, like the Riks (see refs. 12 and 13) and 
Crisfield (see ref. 14) methods, generally require some special handling of system ma- 
trices near limit and bifurcation points. The equivalence transformation algorithm (see 
refs. 15 and 16) has been developed to select appropriate solution paths beyond imminent 
bifurcations. Many of these algorithms treat the critical points using a reduced system 
calculated by transformation of the full system. 

While the requirements of each nonlinear solution algorithm differ in detail to a great 
extent, common elements can be extracted and used to formulate a generic toolkit of neces- 
sary linear algebraic operations. Such an approach has been taken in the STAGS program 
wherein a group of utilities, called the Z-systtm (see ref. 11), has been created and is 
presently being used to provide the computational functionality necessary for executing 
the enhanced Riks and equivalence transformation algorithms. The functionality of the 
Z-system encompasses the basic finite-element operations enumerated in Table 2 plus addi- 
tional operations to (a) suppress and enable individual degrees of freedom dynamically, (b) 
extract values of degrees of freedom, (c) assemble system vectors at the element level, and 
(d) compute matrix-vector products on an element-by-element basis. The basic Z-system 
functionality is presented in Table 6. Many functions are simply vector manipulations and, 
as such, do not depend on the architecture of system matrix software or data structures. 

3.2.6 Advanced Solution Algorithms - Implementation 

Emulation of the Z-system functionality using the CSM Testbed sparse matrix struc- 
ture requires minimal enhancement, providing the combination of system matrices is re- 
stricted to matrices of identical structure, and software is constructed or extracted to 
modify the packed-word entries in the dataset CON.. neon outside of the processor TAB. 
A capability for vector assembly is necessary in the Testbed, as is one for element-by- 
element matrix- vector multiplication (function MXVEC in Table 6). Based on experience 
with STAGS, these two operations can save significant storage and I/O costs during non- 
linear iterations. 
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Table 6. Basic STAGS Z-System Functions 


Function 

Algebraic Representation 

ZADDCvl ,al , v 2 ,a 2 , v 3 ,a 3 ) 

V3 = aiVi + C*2 V 2 + OL 3V3 

ZASS(sky.kl.k2.s.a) 

A = Kj + 0K2 

ZD 0 T(vl ,v 2 ,s) 

a = vx * v 2 

ZFACT(a.v) 

A <= L(A = LDL t ), v <= diag{D} 

ZFIXCvl , v 2 ) 

V2, =0 iff = 0 

ZGET(v,n,id,s) 

St=Vidi, t = l,n 

ZMAX(v.s.i) 

0 = max{vi } 

ZM 0 VE(vl , v2) 

v 2 = Vi 

ZMULT (vl , v 2 , v 3 ) 

»3| = v u v 2i 

ZMXVEC(k,vl ,v 2 ,n) 

v 2i =Kvi i , * = l,n 

ZPUT(v.n.id.s) 

Vid { = Si, t = l,n 

ZSET(v.s) 

v«- = a, % — 1 , n 

ZSMULCf ,s) 

b 

II 


One difficulty with the Testbed is the treatment of prescribed nonzero degrees of 
freedom in the solution procedure. Presently, the internal force contributions of specified 
degrees of freedom axe calculated as a by-product of the forward reduction procedure during 
solution. The matrix factoring procedure is constructed so as to keep appropriate stiffness 
terms in the factored matrix to enable this linear internal force to be computed correctly. 
In nonlinear iterations, however, it is most desirable to calculate the nonlinear internal 
forces in element-level software in order to update the residual force vector correctly. This 
approach makes using the Testbed’s SSOL processor in nonlinear iterations awkward, since 
different constraint values need to be employed at different stages of the solution process 
in order to ensure the correct calculation of the residual force vector. 

Use of a skyline organization in the Z-system functions would be slightly more straight- 
forward than use of the Testbed sparse matrix structure simply because of the inherent 
d.o.f.-by-d.o.f. organization of the skyline matrix. This skyline format makes single- 
equation updates much more straightforward. Also, considerable experience is available 
with the data architecture in the STAGS implementation, which uses a skyline matrix 
structure. 

3.2.7 Hierarchical Convergence Elements (p- Version) - Background 

New finite-elements are currently being developed that use increasing orders of orthog- 
onal polynomials to estimate convergence behavior and discretization errors in structural 
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models. The displacement formulation of these elements, called p-version finite elements, 
describes element displacement fields in terms of amplitudes of assumed orthogonal func- 
tions. Thus, the degrees of freedom in a p-version element are not necessarily nodal dis- 
placement components, but are amplitudes of assumed-displacement functions which may 
have zero or nonzero values at the element nodes. The hierarchical displacement functions 
are analogous to conventional finite-element interpolation functions, but are chosen in such 
a way that the contributions of higher-order functions are orthogonal to all lower-order 
functions in the hierarchy. 

Because of the hierarchical nature of the assumed shape functions, system matrices 
assembled for a given element order are strictly a subset of stiffness matrices for higher 
element orders. This attribute is made possible by the orthogonality of the assumed 
functions, and can result in significant savings in forming and factoring successively higher- 
order system matrices. Such economy enables the practical estimation of convergence and 
errors of discretization in p-version finite-element models. 

3.2.8 Hierarchical Convergence Elements (p-Version) — Implementation 


The essential aspects of the p-version formulation as far as system matrix implemen- 
tation is concerned are: 

1) Elements (or, equivalently, nodes) can be associated with a highly 
variable number of degrees of freedom 

2) Submatrix factorization and solution capabilities can provide signifi- 
cant computational savings in operation 

The first point above makes the p-version finite elements very difficult to integrate 
with the present Testbed sparse matrix structure. This difficulty is due to the nodal- 
block orientation of the sparse matrix and the hard-wired fact that nodes in the Testbed 
cannot be associated with more than 6 degrees of freedom. Schemes to circumvent this 
restriction are either to alleviate the restriction of 6 degrees of freedom per node, or to 
allow provisions to create “dummy” or “pseudo” nodes that can be used as dictated by the 
flow of the refinement process. The first approach requires several coded sixes ( “6” ) to be 
parameterized throughout the Testbed and would without question be a tedious process 
whose correct completion would be virtually impossible to assure. The second approach 
is awkward and eliminates the utility of the system vector data structure, since system 
vectors would have to be sized to the upper bound. Storage and I/O costs could easily 
become excessive just carrying around the excess nodes - especially in the early stages of 
the analysis where perhaps a majority of the nodes would not be needed. 
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Implementation of the p-version elements in a skyline matrix organization is much 
more straightforward, since the matrix structure is natural degree of freedom oriented, 
and can be dynamically extended without changing the minority of the matrix. Thus, 
the skyline-stored profiled matrix has a natural hierarchical structure, analogous to the p- 
version elements themselves. One present difficulty in implementing the p-version elements 
in a skyline matrix system is the fact that the GAL data manager does not presently 
allow dynamic extension of record boundaries. Therefore, the p-version would have to be 
implemented in such a way so as to create new matrix records at each increase of order, 
but the majority of the contents in these records would be simply copied, rather than 
recomputed. 

An interesting issue about economy of storage and factorization costs arises when 
one considers the alternatives for growing the skyline matrix structure as the order of the 
element functions grows. This issue is illustrated in Figure 9, where a basic skyline matrix 
is shown along with two possible growth versions of the matrix. In the scenario depicted 
on the left, the additional terms due to p-refinement have been simply added to the end 
of the matrix. This alternative is referred to as the matrix augmentation approach. In 
the scenario depicted on the right, the additional terms have been assembled directly into 
the matrix interior. This alternative is called the matrix re-assembly approach. As can 
be seen from the relative dimensions of the two matrices in the lower half of Figure 9, the 
matrix on the right has a lower maximum bandwidth, a lower average bandwidth, and a 
lower total profile. Presumably, factorization costs for the matrix on the right would be 
less than for the matrix on the left, however, the factored portion of the matrix up to 
the first row corresponding to added terms remains unaltered and can thus be re-used in 
operations involving the factors of the augmented matrix. 

Intuitively, it seems natural that some breakpoint would exist when the size of the ma- 
trix and the number of additional terms due to p-refinement reach values that equalize the 
cost of the two approaches; augmentation and partial factorization versus re-assembly and 
complete factorization. Both methods would therefore be useful generally in p-version ele- 
ments and algorithms. A skyline matrix structure could be made to admit both approaches 
providing very general matrix assembly and matrix factoring utilities are developed. The 
assembly would necessarily have to account for augmentation and insertion of matrix rows, 
and the factoring would have to be able to begin from a partially factored matrix state. To 
admit further research in the area of computational methods for p-version finite elements, 
both schemes should be made available in any general package of matrix operations. 
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One should note that hierarchical convergence elements based on non-orthogonal func- 
tions could also be developed and used. The matrix issues for this case are largely the 
same as for the case of orthogonal function hierarchies, with the exception that the matrix 
augmentation approach is no longer possible since all parts of an element’s matrix are 
modified upon refinement. 


Basic Matrix Structure 
for lowest -order polynomials 



Element Contributions 

Element No. 

1 

- 2 

3 

4 

5 



Matrix after p-version Refinement of Elements 1, 2 and 5 



Figure 9. Alternative Structures for p- Version Skyline Matrix Growth 
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3.3 Summary 

A summary of some key points addressed in the foregoing discussions of advanced 
capability implementation appears in Table 7. This Table is intended for quick reference 
to the key points brought out in the previous text. Conclusions and discussion based on 
these points is presented in Section 4. 


Table 7. Key Aspects of Matrix Data Structures and Software 
For Basic and Advanced Operations and Algorithms 


Operation 

Testbed. Sparse Matrix 

Generic Skyline Matrix 

Performance on Basic Op- 
erations: 

Factor 

Multiply 

Solve 

+ Topology information 
explicitly stored. 

— Inefficient implementation. 

Good overall with proper order- 
ing. Flop rates lower than ex- 
pected. 

Good 

Notably slow 

+ Dynamic out-of-core blocking 

Good overall with proper order- 
ing. Requires more flops than 
sparse format. 

Good 

Good 

Constraints: 
d.o.f. suppression 

Lagrange multiplier 

Penalty element 
d.o.f. equivalencing 

Cryptic packed data 

Node-to-node constraint straight- 
forward. Awkward for general 

case because of nodal architec- 
ture 

Straightforward 

Very difficult due to nodal archi- 
tecture and supporting data struc- 
ture assumptions. 

Simple sign flag 
Straightforward 

Straightforward 

Must be built into new assembler. 

Substructuring 

Present approach costly and spe- 
cialized 

Need substructure assembler 

Triangular matrix solve available 
Need substructure assembler 

Advanced Solution Algo- 
rithms 

Basic operations available. Treat- 
ment of specified displacements 
in solution is awkward. Dynamic 
d.o.f. suppression needed. 

Straightforward Z-system as in 
STAGS 

p-version Finite Elements 

Severely restricted by nodal ar- 
chitecture and 6 d.o.f./node as- 
sumption 

Straightforward. Amenable to ei- 
ther reassembly or augmentation 
approaches witn partial factoriza- 
tion 
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4. Recommendations for Testbed Matrix Development 


The preceding discussions focused on the extension of current Testbed capabilities to 
accommodate a variety of advanced analysis capabilities. Assumed in this discussion was 
that the characteristics of the most basic levels - the system matrix manipulation software 
and data structures - determine the feasibility of upgrading of the Testbed’s algorithmic 
capabilities. This assumption is not strictly true, since high-level programming l an guages 
afford sufficient flexibility to implement almost any scheme regardless of how intricate 
or particular it may be. But given the stated purpose of the Testbed (to aid technology 
advancement by integrating CSM research and development through a common, extendable 
software architecture), the more basic approach seems more likely to succeed by laying a 
solid foundation for further Testbed and CSM development. The discussion presented in 
this section proceeds from this premise and focuses on defining a development path that 
simultaneously assures flexibility in use and extension, and efficiency in operation. 

This section is divided into three main parts. The first part discusses the implemen- 
tation of new matrix data structures and associated computational facilities in the CSM 
Testbed. Particular attention is given to the implementation of a generic skyline matrix. 
The second part focuses on the design of a generic environment for further development 
of matrix methods in the CSM Testbed and for incorporation of alternative useful matrix 
data structures and computational modules. The third section contains some concluding 
comments and observations about matrix methods development in the CSM Testbed. 

4.1 Incorporation of New Matrix Schemes 

Incorporating new matrix structures and computational utilities into the CSM Testbed, 
whether to supplant or augment the present sparse matrix capability, requires several spe- 
cific requirements to accommodate the advanced algorithms discussed in Section 2. These 
requirements are: 

1) A flexible, substructure-oriented topology analysis and system matrix 
assembler. Critical features of the assembler include the ability to as- 
semble diagonal and off-diagonal substructure matrix blocks, ability 
to use degree of freedom or nodal resequencing lists calculated by 
external utilities, and the ability to assemble only specified elements 
and/or groups of elements. The ability to handle d.o.f.-equivalence 
constraint matrices is desired, but not required, provided that a ca- 
pability to assemble Lagrangian or penalty constraints is provided. 
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2) Matrix computational facilities to perform all of the basic operations 
listed in Tables 2 and 5 including a separate facility for either the 
triangular matrix solve operation or direct computation of the Schur 
complement. These facilities require a data interface to the Testbed 
system-vector data structure. 

3) A facility to enable the dynamic suppression of selected degrees of 
freedom for use in conjunction with advanced nonlinear continuation 
algorithms. 

Any matrix data structure and associated computational software to be implemented 
in the Testbed should be extensively documented as to structure and usage. Since the 
CSM Testbed is to serve as an integrating platform for advanced methods, many of which 
are yet to be defined in detail, the operational particulars of the Testbed software must be 
as clear as possible. This requirement is especially critical with respect to matrix software 
and data structures since matrix algebra operations form the computational cornerstone 
of numerical algorithms. 

4.2 Incorporation of a Skyline Matrix Scheme 

A logical first step toward providing advanced matrix capabilities in the CSM Testbed 
would be the implementation of the generic skyline matrix data structure and utilities. This 
development is supported primarily by considerations of flexibility and an unquantified no- 
tion that algorithm development time and effort would be reduced through the availability 
of a simple and flexible matrix data structure, particularly in cases where system matrix 
data must be specially modified or accessed in algorithmic operations. 

The adoption of a skyline matrix scheme will not adversely affect the computational 
efficiency of matrix operations in the Testbed. The computational efficiency of well-coded 
skyline matrix utilities is, at the very least, competitive with the Testbed sparse matrix 
utilities, based on the results of the performance survey described in Section 3.1.2. The 
primary advantages of the SKYNOM skyline matrix software from a performance stand- 
point arises from its flexible use of memory resources to reduce I/O costs for out-of-core 
solutions, and its native use of the GAL nominal data manager. Potential advantages also 
should be noted for executions on vector processing computers, where the length of the 
matrix row vectors in the skyline structure makes possible significant savings through the 
use of vector arithmetic. 

The flexibility of a skyline matrix data structure is due to its simplicity. The Testbed 
sparse matrix structure is blocked into artificially fixed-length records, each an amalgam 
of indexing and matrix data that cannot be decoded without access to external data 
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structures. The skyline matrix data structure, on the other hand, is self-contained and 
separates indexing and matrix values data into two simple records. The entire structure of 
the skyline matrix is available through examination of a single, system- vector-sized index. 
The word-addressable capability of the GAL data manager makes access to portions of 
the skyline matrix values record straightforward. A useful addition to the skyline matrix 
data structure would be to incorporate topological data as discrete record groups in the 
matrix dataset. These data would preferably describe the equation ordering of the matrix 
and the element-equation connectivity. In causes of substructure matrices, lists of boundary 
and interior nodes or degrees of freedom could also be stored. 

The major drawback of the skyline matrix structure is that it potentially requires 
more external storage than a sparsely stored matrix. In cases where external storage is at 
a premium relative to memory utilization and I/O activity, the sparse matrix makes more 
sense. However, these cases appear only rarely and do not present a sufficient impediment 
to preclude the adoption of a skyline matrix format for the stated reasons of efficiency and 
flexibility. 

The steps necessary to implement a skyline matrix system in the CSM Testbed closely 
parallel those outlined in Section 4.1 and include: 

1) Construction of a flexible skyline matrix assembly program. As noted 
in the discussions of Section 3, an assembler processor capable of as- 
sembling element matrices, full square matrices and skyline matrices 
of different orders and topologies would be ideal. Also, a capability to 
assemble vector blocks representing the off-diagonal coupling terms 
(K,'&) of substructured matrices should be included, along with the 
capability to process d.o.f.-elimination type constraints. The develop- 
mental steps necessary to implement these capabilities in the Testbed 
are: 

a) Construction of modular subroutine or function entry 
points to access Testbed element connectivity and matrix 
data. 

b) Construction of a basic skyline matrix assembler processor 
to provide the same functionality as the present Testbed 
sparse matrix assemblers K, KG and M plus the capability 
to include previously assembled skyline matrices in a new 
skyline matrix. 

c) Addition of substructuring capability to the skyline ma- 
trix assembler to assemble the off-diagonal matrix blocks. 
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d) Implementation of a means to define equivalenced con- 
straint relations in some Testbed modeling processor and 
to assemble skyline matrices subject to these constraints 
in the skyline matrix assembler processor. 

2) Construction of a utility-oriented skyline matrix processor to provide 
all matrix functions of the Z-system (Section 3.2.5) and decoupled 
forward-reduction and backward-substitution functions to assist in 
the efficient implementation of substructuring procedures. This sys- 
tem would ideally be based on the present SKYNOM software to 
take best advantage of previous developments, particularly in the dy- 
namic blocking of out-of-core matrix operations. The development 
steps necessary to implement this facility are: 

a) Extract SKYNOM kernel routines and place them in a 
utility-oriented processor shell constructed for use with 
the Testbed. 

b) Implement dynamic equation suppression and enabling for 
unfactored skyline matrices. 

c) Implement a staged-factoring procedure to support p- 
version finite element development. 

The functionality of the Testbed system would not be impaired during the execution of 
the above development steps. To ensure this, work would progress on the basic capabilities 
outlined in the above steps 1(a), l(b) and 2(a) simultaneously. Once these steps were 
completed, the skyline matrix system would encompass all of the present Testbed matrix 
capabilities and could thus replace it entirely. Further developments would proceed in an 
evolutionary manner with new capabilities (items 1(c), 1(d), 2(b) and 2(d), above) coming 
on-line as available. 

One should note that the prior existence of and experience with skyline matrix software 
like SKYNOM and the STAGS Z-System are substantial benefits to the economy of the 
implementation effort required for a Testbed skyline matrix capability. 

4.3 Generic Environment for CSM Matrix Methods Development 

Further extendability of matrix methods in the CSM Testbed would be greatly facili- 
tated by a computational environment that would provide many useful functions to matrix 
methods and algorithm developers. Such functions might include straightforward access 
to element and system vector data, powerful utilities for local data management, and fa- 
cilities for command-language interpretation. Collection of such functions under a single 
processor umbrella would aid Testbed algorithm developers by unifying the context within 
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which diverse matrix methods would be used and providing a capability to use diverse ma- 
trix data structures and software without changing matrix processor syntax or contextual 
assumptions ( e.g ., how to access vector data). This section discusses the basic design of 
such a system, called the Generic Environment for Matrix- Processing, or GEM-P. 


Key features of a generic environment for matrix processing include: 


Ability to “plug-in” software for conventional algebraic operations 
with various matrix data structures. This Feature would put new 
matrix structures and computational methods directly into the hands 
of CSM algorithm developers and helps to eliminate replication of 
software for user- interaction by matrix methods developers. 

A unified system for parsing command input separate from computa- 
tional software to decouple the formulation and use of higher-level al- 
gorithms from the details of individual matrix processors, data struc- 
tures and conventions. 


• Straightforward I/O utilities to assist developers in the construction 
of out-of-core and parallel processor matrix utilities. 

• Comprehensive and flexible local data management to eliminate un- 
nessary 1/ O and ensure efficient use of memory resources without 
undue burden to matrix software developers. 

A schematic diagram of a suitable generic environment is presented in Figure 10. 
The command parser and external directive handler functions are served presently by the 
Command Language Interface Program (CLIP) segment of the Testbed architecture. To 
process a generic algebraic expression or function into a form suitable for execution, a 
detailed expression parser and checker is necessary. The form produced by this parser 
would be an execution stack that would be processed by the computational interface to 
invoke computational modules and to load and store data as necessary. A table of the 
flow of an algebraic expression through its various forms from algorithmic through the 
invocation of the actual computational rout ines is presented in Table 8. Table 8 i s i ntended 
to illustrate the existence of and distinctions between algebraic operations at different 
conceptual and software levels. 
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Table 8. Matrix Methods Flowdown Chart. 
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The execution stack, computational interface and routines and local data complex 
form the computational engine of the matrix processor. Standardized computational and 
local-data manager interfaces decouple the details of the remainder of the matrix processor 
from the details of the computational modules making the entire package generic. Symbol 
tables and a flexible expression parser are conveniences to allow algorithm developers to 
express their algorithms in an algebraic form, rather than a cryptic form like that em- 
ployed in a functional execution stack. Further environmental conveniences might include 
a numerical debugger and a comprehensive performance measurement facility. 

The natural first step to the development of the generic enviroment for matrix process- 
ing is the definition of the computational and local data manager interfaces. Once these 
have been defined, a rudimentary processor can be constructed based on algebraic data 
symbols and a functional, stack-oriented command syntax. When the expression parsing 
and checking routines are complete, they can simply be added on top of the function stack 
already in use. 
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Figure 10. Components of the Generic Environment for Matrix Processing. 
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4.4 Concluding Remarks 

The purpose of the Testbed is to promote advanced methods research and development 
for analyzing the aerospace structures of the 1990’s using parallel and vector processing 
computers. The present Testbed software must be enhanced to provide an environment for 
advanced development. New flexible and extendable matrix data structures and utilities 
are necessary to enable the Testbed to fulfill its role in the productive development of 
improved computational structural mechanics algorithms. 

A simple repackaging of the logic structures and functional organization of the SPAR 
matrix processors in a modern, structured form is of limited value. The explicit nodal 
orientation of the system matrix structure should be abandoned and all matrix operations 
should be made available in the context of a single processor, rather than in the present 
menagerie of TOPO, K, M, KG, INV, SSOL, AUS, SYN, etc. While the nodal orientation 
of the system vectors is convenient from a usage standpoint, one should recognize that 
it is the function of the finite-element formulations, not the matrix algebra software, to 
translate the governing partial differential equations into an algebraic form. Requiring the 
matrix software to conform to a specific finite-element context severely limits the algebraic 
utility of the matrix software while providing only marginal convenience in conventional 
executions. What is required is a different approach - one that ensures flexibility while 
not sacrificing functionality or efficiency. 

The advocated approach is to develop the generic environment for matrix processing, 
discussed in Section 4.2. This approach is a departure from past, functionally oriented 
approaches in that the matrix software environment is to be constructed without specific 
reference to particular matrix data structures or software kernels. Instead, an accom- 
modating standard interface set is to be used to allow multiple kernels to be referenced 
within a single processor. Such an approach ensures a contextual uniformity to the invo- 
cation of all matrix software and reduces the workload for any subsequent CSM researcher 
implementing new matrix tools since user and database interfaces are provided with the 
environment. One of the goals of this approach is to ensure that the “environmental” 
software is not unduly burdened with overhead, so that the kernels themselves can be 
optimized to produce run-time efficiency. 

Finally, while this document does not address matrix data structures other than the 
present Testbed sparse and a generic skyline structures, one should carefully consider other 
organizations as well. For instance, frontal and multifrontal approaches are rapidly gaining 
in popularity as reordering algorithms for these sparse structures are becoming more robust 
and vectorization of multifrontal solvers has been accomplished with great success. These 
matrix structures are certainly good candidates for implementation in the Testbed and 
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such an implementation would no doubt assist in the advancement of the state of the art 
in computational structural mechanics. One should be cautious, however, at leaping to 
the adoption of one particular structure in favor of all others since no one structure will 
provide both the efficiency and the flexibility necessary to develop and exercise the wide 
variety of CSM algorithms expected to result from research and development use of the 
Testbed. 
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